Producing video data

ABSTRACT

A method of incorporating additional video objects into source video data to produce output video data. The method includes retrieving source video data and data defining a segment size used by a distributor, into which video data is divided when transmitted. The method includes analyzing the source video data to identify selected frames of video material which include insertion zones which correspond to regions which are suitable for receiving an additional video object. The method includes identifying a boundary point of the source video data. The method includes embedding additional video objects into the selected frames, creating output video data which has a boundary which corresponds with the identified boundary point. The method includes generating metadata including information on said boundary point of the source video data to be replaced by the created output video data; and transmitting the output video data and the metadata to the distributor.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to producing video data. In particular,but not exclusively, the present invention relates to methods for, andfor use in, incorporating one or more additional video objects intosource video data to produce output video data, to computer programs,computer program products arranged and systems comprising apparatusadapted for performing such methods.

2. Description of the Related Technology

The broadcast industry has changed significantly in recent years. Withthe rise of the internet, digital network based streaming is becomingmore popular and gradually replacing traditional televisionbroadcasting. Prior to these changes, television programs were oftenrecorded on video tape, either in a television studio or on location.With videotape there is no file structure; just linear pictureinformation. The availability of digital technologies has resulted inmedia which are structured with directories and files. The number ofprocesses between raw captured material and the final material isconstantly increasing as, in the file-based domain, it is possible tocreate workflows by concatenating several processes.

With digital file processing, many new processes become possible thatcan be used to embed a branded product within a scene retrospectively.This may involve digitally post-processing a captured scene to add arepresentation of, for example, a branded drinks container on a table orshelf.

Adaptive Bitrate Streaming is a method for delivering video to InternetProtocol (IP) devices such as Smartphones, tablets, Connected TVs,laptops etc. In adaptive bitrate streaming, video data is delivered insmall segments or chunks (e.g. 2 s). Each segment is encoded intoseveral bitrates, e.g. 400 Kbps, 700 Kbps and 1500 Kbps. Depending onthe bandwidth and the capability of the device at any moment, the videodata is switched to a higher or lower bitrate for delivering video datato the device. There are different adaptive bitrate formats that areavailable, HTTP Live Streaming (for Apple devices), HTTP DynamicStreaming (by Adobe), Microsoft Smooth Streaming and Dynamic AdaptiveStreaming over HTTP (DASH). Depending on the device requesting the videodata, an appropriate video data format is delivered. Since these varyonly by format, it is possible to produce or convert video data intodifferent video data formats.

A known system allows for adverts to be placed in between chunks ofvideo material, to replicate the traditional advertising technique of a“mid-roll” or advert break. However, this technique has the disadvantagein that users may skip through the advert and also that the length ofthe video is extended.

Another known system provides embedded adverts within frames of videomaterial. However, this suffers from the problem that a large amount ofvideo data must be transferred, and targeting adverts at specific usersrequires that several versions of the embedded video need to be producedand then delivered. The large file size makes delivery of multiple filesmore time consuming and inefficient. Moreover distributors are not setup to switch embedded adverts when an advert campaign launches and ends.Also, distributors are not set up for gathering analytics on the advertcampaign.

It would be desirable to provide improved arrangements for producingvideo data.

SUMMARY

Embodiments of this invention seek to provide apparatus and methods forproviding video material in which one or more additional video objectshave been embedded for distribution using adaptive bitrate streaming.

According to one embodiment of the present invention, there is providedmethod of incorporating one or more additional video objects into sourcevideo data to produce output video data, the method including retrievingsource video data, the source video data including frames of videomaterial and data defining a segment size used by a distributor, intowhich video data is divided when transmitted by the distributor;analyzing the source video data to identify selected frames of videomaterial which include one or more insertion zones, wherein theinsertion zones correspond to one or more regions within the selectedframes of video material which are suitable for receiving an additionalvideo object; identifying at least one boundary point of the sourcevideo data, based on the data defining the segment size into which videodata is divided prior to distribution, and based on the selected frames;embedding the one or more additional video objects into the selectedframes; creating output video data which includes the one or moreadditional video objects and which has a boundary which corresponds withthe identified at least one boundary point; generating metadataincluding information on said at least one boundary point of the sourcevideo data to be replaced by the created output video data; andtransmitting the output video data and the metadata to the distributor.

According to another embodiment of the present invention, there isprovided a method of incorporating output video data into source videodata to produce final video data, the method comprising: providing thesource video data, the source video data comprising: frames of videomaterial; and data defining a segment size used by a distributor, intowhich the output video data is divided when transmitted by thedistributor; receiving output video data from a remote location, theoutput video data including the one or more additional video objectsinserted into insertion zones of the source video data, wherein theinsertion zones correspond to one or more regions within selected framesof video material which are suitable for receiving an additional videoobject; receiving metadata from the remote location, the metadataincluding information on at least one boundary point of the source videodata to be replaced by the created output video data, wherein theboundary point is based on the data defining the segment size into whichthe data is divided prior to distribution, and based on the selectedframes; splitting the source video data into source video data segmentsand placing one or more output video data segments therein based on thereceived output video data and the received metadata to create the finalvideo data.

According to another embodiment of the present invention, there isprovided a method of incorporating one or more additional video objectsinto source video data to produce output video data, the methodcomprising: retrieving source video data, the source video datacomprising: frames of video material; and data defining a segment sizeused by a distributor, into which video data is divided when transmittedby the distributor; analyzing the source video data to identify selectedframes of video material which include one or more insertion zones,wherein the insertion zones correspond to one or more regions within theselected frames of video material which are suitable for receiving anadditional video object; identifying at least one boundary point of thesource video data, based on the data defining the segment size intowhich video data is divided prior to distribution, and based on theselected frames; embedding the one or more additional video objects intothe selected frames; creating output video data comprising: the selectedframes into which the one or more additional video objects are embedded;and one or more additional frames of video material, which are locatedbetween the selected frames and the at least one boundary point, if theselected frames are located within a threshold of the at least oneboundary point; generating metadata including information on the sourcevideo data to be replaced by the created output video data; andtransmitting the output video data and the metadata to the distributor.

According to another embodiment of the present invention, there isprovided a system for incorporating one or more additional video objectsinto source video data to produce output video data, comprising: amemory configured to store retrieved source video data, the source videodata comprising: frames of video material; and data defining a segmentsize used by a distributor, into which video data is divided whentransmitted by the distributor; a processor configured to: analyze thesource video data to identify selected frames of video material whichinclude one or more insertion zones, wherein the insertion zonescorrespond to one or more regions within the selected frames of videomaterial which are suitable for receiving an additional video object;identify at least one boundary point of the source video data, based onthe data defining the segment size into which video data is dividedprior to distribution, and based on the selected frames; embed the oneor more additional video objects into the selected frames; create outputvideo data which includes the one or more additional video objects andwhich has a boundary which corresponds with the identified at least oneboundary point; generate metadata including information on said at leastone boundary point of the source video data to be replaced by thecreated output video data; and transmit the output video data and themetadata to the distributor.

According to another embodiment of the present invention, there isprovided a system a non-transitory computer-readable medium havingcomputer executable instructions stored thereon, which when executed bya computing device cause the computing device to perform a method ofincorporating one or more one or more additional video objects intosource video data to produce output video data, comprising retrievingsource video data, the source video data comprising: frames of videomaterial; and data defining a segment size used by a distributor, intowhich video data is divided when transmitted by the distributor;analyzing the source video data to identify selected frames of videomaterial which include one or more insertion zones, wherein theinsertion zones correspond to one or more regions within the selectedframes of video material which are suitable for receiving an additionalvideo object; identifying at least one boundary point of the sourcevideo data, based on the data defining the segment size into which videodata is divided prior to distribution, and based on the selected frames;embedding the one or more additional video objects into the selectedframes; creating output video data which includes the one or moreadditional video objects and which has a boundary which corresponds withthe identified at least one boundary point; generating metadataincluding information on said at least one boundary point of the sourcevideo data to be replaced by the created output video data; andtransmitting the output video data and the metadata to the distributor.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic diagram showing a system in accordancewith some embodiments.

FIG. 2 illustrates an example of adaptive bitrate streaming.

FIG. 3 illustrates a sequence timing diagram showing the flow ofmessages associated with adding one or more additional video objectsinto source video data to produce output video data in accordance withsome embodiments.

FIG. 4 illustrates a metadata file in a VMAP format.

FIG. 5 illustrates a sequence diagram for inserting output video datainto source video data in accordance with some embodiments.

FIG. 6 illustrates a diagram that illustrates a method for incorporatingone or more additional video objects into source video data to produceoutput video data in accordance with some embodiments.

FIG. 7 illustrates multiple versions of the output video data inaccordance with some embodiments.

FIG. 8 illustrates a targeting process in accordance with someembodiments.

FIG. 9 illustrates a sequence timing diagram showing the flow ofmessages associated with placing output video data into source videodata to produce final video data in accordance with some embodiments.

FIG. 10 illustrates a sequence diagram for inserting output video datainto source video data in accordance with some embodiments.

DETAILED DESCRIPTION OF CERTAIN INVENTIVE EMBODIMENTS

FIG. 1 is a schematic diagram showing a video processing system 100 inaccordance with some embodiments. The subsystems of the video processingsystem are connected via one or more data communication networks (notshown). In some embodiments, the subsystems are connected to each othervia the Internet.

Subsystem 102, which is referred to herein as the “source” hub, performsamongst other things, video data analysis in the video processing system100. The source hub 102 may retrieve source video data as one or moredigital files, supplied, for example, on video or data tape, on digitalversatile disc (DVD), over a high-speed computer network, via thenetwork, on one or more removable disc drives or in other ways. In oneembodiment, the source video data is provided by a distributor. Inanother embodiment, the source video data is provided by a content owner104.

The source video data comprises frames of video material. Contiguousframes of video material set in one location are known as shots.

In some embodiments, the source hub 102 comprises a video data analysismodule, which performs pre-analysis in relation to source video data.Such analysis may be performed using appropriate software which allowsproducts to be placed digitally into existing video material.

The pre-analysis may be fully automated in that it does not involve anyhuman intervention.

In some embodiments, the video data analysis module is used to perform apre-analysis pass in relation to the source video data to identify oneor more shots in the source video data. This may involve using shotdetection and/or continuity detection which will now be described inmore detail.

Pre-analysis may comprise using a video format detection algorithm toidentify the format of the source video data, and if necessary, convertthe source video data into a format capable of receiving one or moreadditional video objects.

Pre-analysis may comprise using a shot detection function to identifythe boundaries between different shots in video data. For example, thevideo data analysis module 102 a automatically detects “hard” and “soft”cuts between different shots, which correspond to hard and softtransitions respectively. Hard cuts correspond to an abrupt change invisual similarity between two consecutive frames in the video data. Softcuts correspond to the beginning or the end of a soft transition (forexample wipe and cross fading transitions), which is characterized by asignificant but gradual change in visual appearance across severalframes.

Pre-analysis may comprise using a continuity detection function toidentify similar shots (once detected) in video data. This can be usedto maximize the likelihood that each (similar) shot in a given scene isidentified—this may be a benefit in the context of digital productplacement. For each detected shot, a shot similarity algorithm detectsautomatically visually similar shots within the source video data. Thesimilarity detection is based on matching between frames, which capturesan overall global similarity of background and lighting. It may be usedto identify shots which are part of a given scene in order to speed upthe process of selecting shots that should be grouped together on thebasis that they are similar to each other.

Pre-analysis may comprise using an object and/or locale templaterecognition function and/or a face detection and recognition function.Object template recognition involves identifying objects which reappearacross, for example, multiple episodes of a television program, andwhich are appropriate for digital product placement, so that they canautomatically be found in other episodes of the program. Locale templaterecognition allows a template to be built for a certain locale in atelevision program and automatically detect the appearance of the localein subsequent episodes of the program. A locale is a location (e.g. aroom) which appears regularly in the program across multiple episodes.Face detection and recognition involve identifying characters which, forexample, reappear across multiple episodes of a television programme.This allows for characters to be associated with a particular digitalproduct placement.

Pre-analysis may comprise using a tracking (such as 2D point tracking)function to detect and track multiple point features in video data. Thisinvolves using a tracking algorithm to detect and track feature pointsbetween consecutive frames. Feature points correspond to locationswithin an image which are characteristic in visual appearance; in otherwords they exhibit a strong contrast (such as a dark corner on a brightbackground). A feature is tracked by finding its location in the nextframe by comparing the similarity of its neighboring pixels.

Pre-analysis may comprise using a planar tracking function to followimage regions over time and determine their motion under the assumptionthat the surface is a plane. This may involve tracking 2D regionsdefined by splines, calculating their 2D translation, rotation, scale,shear and foreshortening through the video data. This process createsmotion information that can be exploited by other video analysisfunctions.

Pre-analysis may comprise using a motion-from-features detectionfunction which involves using the tracked 2D points to determine 2Dmotion in the video data. Given a set of tracked feature points,motion-from-features detection involves detecting which points movetogether according to the same rigid motion.

Pre-analysis may comprise using a 3D tracking function which involvesusing the tracked 2D points to determine 3D motion in the video data. 3Dtracking involves extracting geometric information from a video shot,for example the camera focal distance, position and orientation as itmoved. The other information recovered is the 3D shape of the viewedscene, represented as 3D points.

Pre-analysis may comprise using an autokeying function to separatebackground and foreground areas, allowing products to be digitallyplaced while respecting any occluding (foreground) objects to provide anatural-looking embedded image. When a foreground object moves in frontof the background where it is desired to place a product digitally, thearea into which the product is to be placed should stop at the boundarybetween the foreground and background areas. In general, the digitallyplaced product should cover the “mask” area of the background data. Thecorrect mask can be especially difficult to create when the edge of theforeground object is very detailed or blurred. The autokey algorithmuses the planar tracker to create motion information so that knownbackground or foreground areas can be propagated forwards and backwardsthrough the video in time.

Pre-analysis may comprise region segmentation which is used to split thevideo data into regions that span both time and space. Regionsegmentation involves using an algorithm that detects regions of similarpixels within and across frames of a given video scene, for example toselect point features for motion estimation.

Pre-analysis may comprise using a black border detection function, whichis used to find the borders around the video image part of video data.This involves using an algorithm that detects the presence of black barsaround the frames in a video sequence, which can interfere with variousvideo processing algorithms.

Pre-analysis may comprise proxy creation, which involves creating alower resolution and/or compressed version of the source video data.

The source hub analyses the source video data to find regions within thesource video data which are suitable for receiving one or moreadditional video components. The regions within the source video datawhich are suitable for receiving additional video data are known asinsertion zones.

In one embodiment, the source hub 102, is also used for creative work inthe video processing system 100.

The source hub 102 is provided with the modules, such as a trackingmodule which may be used to determine how the position of a digitallyplaced product should vary when added into video material, for exampleto take into account any movement of the camera that recorded the videomaterial. Tracking may be automated and/or may involve humanintervention.

The source hub 102 also comprises a masking module. The masking moduleis used to assess how to handle occlusion (if any) of a product to bedigitally placed in video material having regard to other objects thatmay already be present in the video material. Masking assessment may beautomated and/or may involve human intervention.

The source hub 102 also comprises an appearance modelling module. Theappearance modelling module is used to provide a desired appearance inrelation to the digitally placed product, for example using blur, grain,highlight, 3D lighting and other effects. Appearance modelling may beautomated and/or may involve human intervention.

Subsystem 106 represents a data store which is suitable for storingvideo data.

Subsystem 108 represents an encoder for encoding video into one or morebitrates. The encoder 108 is capable of receiving video data from thesource hub 102, compressing the video and converting the video data intoone or more formats.

Subsystems 110 and 112 represent an origin server and content deliverynetwork (CDN) respectively. CDN allow data to be transmitted moreefficiently over the internet and are well known in the art.

Subsystem 114, which is referred to as the “distributor”, performsamongst other things, video distribution. An example of a distributor isYOUTUBE®. The distributor allows access to video files to users throughthe network.

Subsystems 122, 124 and 126 represent a stream manager, Ad decisionsystem and Ad policy manager respectively and are described in moredetail below.

Traditional video streaming techniques such as Real-Time StreamingProtocol (RTP) and Windows Media HTTP Streaming Protocol (MS-WMSP)involved a server sending a steady stream of data packets, encoded at auniform bitrate, to a client, such as 116, 118 or 120. The server sendsthe data packets to the client only at the bitrate at which the video isencoded. For example, if a source video is encoded at 500 kbps, then thevideo will be streamed at 500 kbps. Further, the server only sendsenough data packets to fill the client buffer. The client buffer istypically between 1 and 10 seconds. This means that even if a userpauses a streaming video, only between 1 and 10 second of video materialis downloaded at the client.

Progressive download advanced traditional streaming techniques byallowing streamed video to be played at the client before the wholevideo file has been downloaded. Progressive download is supported bymost media players and platforms and operates based on a simple downloadform a HTTP Web Server. Unlike traditional streaming, is a video ispaused, the remainder of the video will continue to download to theclient.

FIG. 2 illustrates adaptive bitrate streaming. Adaptive bitratestreaming works by splitting video data 202 into many “chunks” or“segments”, such as A2, B2, C2, etc. The segments are uniform and aretypically between 2 s and 10 s long. The exception to the rule regardinguniform segments is that the last segment size may not correspond to thesize of other segments. Alternatively, the segment size may be definedin terms of the number of frames of video material. In one embodiment,the segment size may be between 50 and 250 frames. The segment size isdetermined by a distributor, and varies depending on the distributorsdelivery platform and the type of content (live, VoD etc.). Thedistributor provides data to the source hub defining a segment size usedby the distributor into which final video data is divided beforetransmission.

Each segment is encoded into at least two bitrates by an encoder. InFIG. 2, the video data is encoded into three bitrates represented by204, 206 and 208. In some embodiments, each segment is encoded intobetween 3 and 14 different bitrates of different quality. In oneembodiment, each segment is encoded into three bitrates: 400 Kbps, 700Kbps and 1500 Kbps.

The client can now choose between segments encoded at different bitratesdepending on the available network capacity for the client. If theclient has a high bandwidth capacity then the segments encoded at a highbitrate are selected and the video data is streamed at a higher quality.If the network bandwidth is reduced, then the client can select a lowerbitrate segment. As with progressive download, the video data willcontinue to download if the user pauses a video. In the example of FIG.2, the first two segments are selected from the encoded video data 204.The next segment is selected from the encoded video data 206 etc.

Each segment begins with an Instantaneous Decoder Refresh (IDR) frame.The IDR frame is utilized as a reference frame for the other videoframes in the segment. As the remaining frames in the segment referencethe IDR frame, compression techniques may be employed to compress thesegment video data. Subsequent frames can be decoded using the IDR frameand does not need any other information prior to that. The remainingframes in a segment would not be played if the IDR frame is missing. Inone embodiment, the segment size corresponds to the IDR frame interval.

Adaptive bitrate streaming allows the client to download the video dataat a rate suitable to the current network conditions and therefore itreduces the risk that there will not be sufficient video segments in theclient buffer to continue playing the video. Adaptive bitrate streamingprovides a platform for inserting output video data, which includes oneor more additional video objects, such as advertising components, withinthe source video data to produce final video data.

FIG. 3 shows a sequence timing diagram showing the flow of messagesassociated with adding one or more additional video objects into sourcevideo data to produce output video data in accordance with someembodiments.

In step 3 a, the source hub 302 retrieves source video data. In oneembodiment in the source video data is retrieved from the distributor.In one embodiment, the source video data is retrieved from the contentproducer 304. The source video data includes frames of video materialalong with data defining a segment size used by the distributor in whichvideo data is divided prior to distribution. The segment size may bedefined in either time period or number of frames. In one embodiment thesource video data comprises information on a minimum permitted length ofinsertion zone. If the insertion zone is below the minimum permittedlength, then additional video objects will not be embedded within thesource video data. The source video data further comprises informationon

As step 3 b, the source hub 302 analyses the video material to identify“insertion zones” or regions within the source video data which aresuitable for receiving one or more additional video objects.

In one embodiment, following an identification of an insertion zone in ashot of source video material, the remainder of the source videomaterial is analyzed to identify if the insertion zone appears in one ormore shots in the source video material. Frames in which the insertionzones are identified are known as selected frames.

In step 3 c, the source hub 302 calculates a location of at least oneboundary point, based on the data defining the segment size into whichvideo data is divided prior to distribution, and based on the selectedframes. The boundary point corresponds to a start point of a segment inthe final video data produced by the distributor prior to distribution.The boundary point represents the earliest boundary of a segment inwhich the selected frame will occur in the final video data. In oneembodiment, the boundary point comprises an IDR frame. In one embodimentthe source video data comprises data on the frame rate of the sourcevideo data. In one embodiment, the source hub provides the output videodata at a different frame rate compared with the source video data.Therefore, the source hub needs to incorporate the difference betweenthe frame rates into the step of identifying the boundary point and theprovision of the output video data.

In an example, the selected frames in which insertion zones have beenidentified, occur between 8.5 s and 9.5 s from the start of the sourcevideo data, and the distributor provides information that the segmentsize is 2 s, then the source hub 302 calculates that the boundary pointwill, in this example, occur at 8 s.

In an example, the selected frames occurring between 9.5 s and 10.5 s,and the segment size is 2 s, then again, the boundary point will occurat 8 s.

The boundary point is calculated to ensure that the output video dataproduced by the source hub 302 adhered to the requirements of thedistributor, such as their adaptive bitrate streaming settings. As thereis an IDR frame at each boundary point, it is important to replace awhole segment of final video data, rather than a portion of the segment.By calculating a boundary point of the segment, the source hub 302 cancalculate how many more frames, in addition to the selected frames, needto be included in the output video data, such that the provided outputvideo data corresponds to a multiple of the segment size. Replacing awhole segment, rather than a portion of the segment means that it iseasier for the output video data to be included into the source videodata by the distributor. In one embodiment, two boundary points areidentified for each segment of source video data, the first correspondsto the first frame of the segment and comprises an IDR frame, the secondboundary point corresponds to the last frame of the segment, which islocated one frame prior to the next IDR frame.

In step 3 d, the one or more additional video objects are embedded intothe selected frames. In one embodiment, the embedding of one or moreadditional objects into the source video data is carried out at thesource hub 302 itself. In an alternative embodiment, the source videodata and the one or more additional video objects are provided to one ormore remote hubs, such as a creative hub, to perform this embedding.Providing a hub in a remote location allows resources to be moreefficiently distributed, for example, by locating the remote hub near alarger pool of labor.

In step 3 e, output video data is created. The output video datacomprises one or more output video data segments. The output video datamay include one or more frames which are not selected frames. The reasonfor this is that it is important that the boundary of the output videodata corresponds to a boundary of the final video data which will besegmented prior to distribution by the distributor. In the exampledescribed above, in which a collection of selected frames in which aninsertion zones are present are located between 8.5 s to 9.5 s, and thesegment size is 2 s, the boundary point will occur at 8 s. To ensurethat the output video data has a boundary which corresponds to theboundary point, video frames which occur between 8 s and 8.5 s will alsobe included in the output video data, even though in this example, theydo not include an insertion zone.

In the second example described above, in which the insertion zones arepresent in frames occurring between 9.5 s and 10.5 s, the relevantboundary point will also be 8 s. The reason that the boundary point is 8s rather than 10 s is that a portion of the output video data will occurin the segment between 8 s and 10 s, that portion will be encodedrelative to the IDR frame occurring at 8 s. In this example, the videoframes between 8 s and 9.5 s will also be included in the output videodata. In addition, the frames between the end of the selected frames tothe next boundary point will also be included, which in this example, isthe frames between 10.5 s and 12 s. In one embodiment, the size of theoutput video data corresponds to the size of segment as provided by thedistributor. In another embodiment the output video date corresponds toa multiple of the size of the segment as provided by the distributor.

Since the output video data is much smaller in duration compared to thesource video data, file sizes of output video data is much smaller andhence the transmission of the output video data easier compared withtransmitting the source video data.

In step 3 f, metadata is created which provides information on the atleast one boundary point of the source video data to be replaced by theoutput video data. The metadata comprises timing information, includingthe timing of the boundary point. In one embodiment the metadatacomprises timing information on the length of time of the output videodata. In one embodiment, the metadata includes information on the numberof frames of output video data.

In one embodiment, the metadata includes data on a tracking pixel. Thetracking pixel can be configured to transmit a message as a result ofthe output video data being played. There tracking pixel could be placedat the start of the output video data, or at any point throughout thecourse of the output video data. The tracking pixel enables data to begathered on the number of times that a particular advert, in outputvideo data, has been played, which can be utilized for billing purposes.If more than one tracking pixel is included in the metadata, thetracking pixel configured to fire at the start, middle and end of theoutput video data, then data can be collected on the number of times thestart, the middle and end of the output video data.

In one embodiment the metadata is in a Video Multiple Ad Playlist (VMAP)format. VMAP is a standard from the Interactive Advertising Bureau(IAB), VMAP is in XML format that is typically used to describe thestructure of advert inventory insertion, for traditional pre-roll andmid-roll advertisements. A pre-roll advertisement is a collection ofvideo frames, which are not derived from the source video data, whichare configured to play before the final video data is provided to auser. A mid-roll advertisement is similar to a pre-roll advertisement,but wherein the advertisement is played in between a break in frames ofthe final video data.

The VMAP file is used by companies such as DoubleClick™, Google® andAdobe®. VMAP is used in conjunction with another standard called VideoAd Serving Template (VAST). VAST is another XML based standard used forserving adverts to online digital video players. A VMAP file is used inconjunction with the VAST standard.

As these standards are established in the industry, they can be utilizedto provide advertisements to a wide audience. The metadata will not beused exactly the same way as a VMAP file would be used to specifymid-rolls and pre-rolls information as the output video data will not beused in the same way as traditional pre-roll or mid-rolls. FIG. 4 showsan example of a metadata file in a VMAP format.

Marker 1 in FIG. 4 shows information on the output video data that is tobe inserted into the source video data. The metadata providesinformation about the output video data and data of the additional videodata which is embedded in the output video data. The metadata mayprovide information on the location of the output video data, such as aURL address.

Marker 2 in FIG. 4 shows information the time code to insert the outputvideo data into the source video data. Marker 3 in FIG. 4 showsinformation the duration of the output video data. For example if theoutput video data duration is 14 section, and the standard segment sizeused by the distributor is 2 seconds, then the output video data willconsist of 7 segments. Marker 4 in FIG. 4 shows information on theoutput video data video file. Marker 5 in FIG. 4 shows information on afiring pixel that fires when a specified instant within the output videodata is reached. These instances could be the start of the output videodata, 1 second into the output video data or the end of the output videodata. In one embodiment, metadata data is provided to a Stream Manager.The Stream Manager may manage the playlist and streams of the client. Inanother embodiment, since the metadata is in the VMAP format, themetadata is used directly by the player.

In step 3 g, the output video data and the metadata are transmitted tothe distributor.

FIG. 5 shows a sequence diagram for inserting output video data intosource video data. In S500, source video data is retrieved. In S502,boundary points are identified within the source video data, based onthe data of the segment size provided by the distributor. The boundarypoints represent the segments into which the video data will be dividedprior to distribution, based on data of the segment size provided by thedistributor. The source video data is not necessarily divided at thisstage.

In S504, the source hub analyses the source video data to identify oneor more insertion zones in the source video data. Selected frames inwhich the one or more insertion zones are identified are represented byb′ and de′. Selected frames b′ correspond with segment B, therefore theboundary point occurs at the boundary between segments A and B. Selectedframes de′ correspond with segments D and E, therefore the boundarypoint occurs at the boundary between segments C and D.

In S506, the additional video data is embedded within the selectedframes and output video data is created which includes the one or moreadditional video objects. The boundaries of the output video datacorrespond with a boundary of the identified at least one boundarypoint. In S506, the segment B′ includes the one or more selected framesin which the additional video objects have been embedded, and alsoincludes one or more frames in which additional video objects have notbeen added. The size of the segment B′ corresponds to the size ofsegment B, which will be replaced by B′ in the final video data. Bymatching the size of the segment B′ with B, the process of replacing thesegment is made easier.

Segment D′ and E′ includes the one or more selected frames in which theadditional video objects have been embedded, and also includes one ormore frames in which additional video objects have not been added. Thesize of the segments D′ and E′ corresponds to the size of segment D andE, which will be replaced by D′ and E′ in the final video data.

S508 shows the final video data showing segments of the output videodata, B′, D′ and E′ replacing segments of the source vide data, B, D andE. Information on where the segments of the output video data whichincludes B′, D′ and E′ is transmitted to the distributor in themetadata.

Before distribution of the final video data shown in S508, the finalvideo data is split into the segments at the distributor. The segmentsare encoded into multiple bitrates and delivered to the client usingadaptive bitrate streaming techniques.

In one embodiment, the output video data is divided into segments andencoded at the source hub. Each segment is encoded into multiplebitrates at the source hub as shown in FIG. 6. FIG. 6 shows segment B′,D′ and E′ each being encoded into a first version of the output videodata into a first bitrate, represented by B1, D1 and E1. FIG. 6 alsoshows the output video data being encoded into one into one or morebitrates, represented by B2, D2 and E2 and B3, D3 and E3. The encodedoutput video data is provided to the distributor and the distributor caninsert the segments within the source video data based on thecorresponding metadata.

Targeted advertising involves providing different advertising componentsto different users depending on characteristics of the user. FIG. 7shows an example of multiple versions of the output video data beingcreated in accordance with some embodiments, the one or more additionalversions of the output video data comprising one or more additionalvideo objects which are different to the one or more additional objectswhich are inserted into other versions of the output video data. StepS700 corresponds to step S504 shown in FIG. 5, in which insertion zonesb′ and de′ have been identified in the source video data.

In step 702, three versions of the output video are created 71, 72 and73. In output video data 71, a first advertising component correspondingto a first set of additional video data is inserted to produce a firstversion of the output video data. In output video data 72, a secondadvertising component corresponding to a second set of additional videodata is inserted to produce a second version of the output video data.In output video data 73, a third advertising component corresponding toa third set of additional video data is inserted to produce a thirdversion of the output video data. The three versions of the output videodata are transmitted to the distributor.

The method described above and shown in FIG. 7 enables output video datawith several product variations to be created. For example, if there are3 variations of additional video data to be embedded into source videodata to create output video data 6 s long, within source video data of22 minutes, the sum of the length of the four output video data is stillG*3=18 s, compared with 1320 seconds for the source video data. It isconceivable that there are different brands placed in the same contentto be delivered to different demographics, locations or users.

In embodiments, the distributor comprises an Ad policy manager, an Addelivery system and a stream manager. The Ad Policy Manager sets generalrules as to when to deliver each variation, for example, an affluentarea may be targeted with adverts for high-end cars. The Ad deliverysystem analyses the characteristics of a profile of a user to determinewhich rules set by the Ad policy manager apply. The stream manageroperates in conjunction with user's client device and provides amanifest file to the client device, which dictates the contents to beprovided to the client device.

FIG. 8 illustrates a targeting process in accordance with someembodiments. The source hub 802 creates three versions of output videodata 804, 806 and 808. Output video data 804 comprises a first one ormore additional video objects. Output video data 806 comprises a secondone or more additional video objects. Output video data 808 comprises athird one or more additional video objects. The three versions of theoutput video data, 804, 806 and 808 are encoded and cached by thedistributor. When a user 818 requests video content from the distributorplayer 816, the player requests a manifest file from the Stream Manager812. The Stream Manager contacts the Ad Decision System 814 which thenanalyzes the profile of the user 818 and selects a targeted advert,based on the rules set by the Ad Policy Manager 810, which has access tothe output video data 804, 806, 808. The corresponding encoded outputvideo data in which the selected advert is embedded is then added to themanifest file. The Stitched Manifest file is then sent to the playerwhich then seamlessly plays the output video data comprising theselected advert as part of the final video data. The Ad Decision Systemmay select a different advert for each user 818, 820, 822.

Targeted advertising satisfying specific geographies, specific devices,or specific viewers. In one embodiment, a different version of theoutput video data will exist for each targeted version in a singleadvertising campaign. These versions will differ in the advertisingcontent embedded within the output video data. For example output videodata version 1 could include a Salsa flavor for a product and outputvideo data version 2 could include a Quezo flavor for a product. Whichversion of the output video data is provided for a viewer is decided bythe Ad Decisioning System.

FIG. 9 shows a sequence timing diagram showing the flow of messagesassociated with placing output video data into source video data toproduce final video data in accordance with some embodiments.

In step 9 a, source video data comprising frames of video material anddata defining a segment size used by a distributor is provided to thesource hub. The source video data may be created by one or more contentproducers and provided to the distributor. The source video datacomprises frames of video material and data defining a segment size usedby a distributor, into which the output video data is divided whentransmitted by the distributor. As described above, the source videodata is divided into segments prior to distribution for use withadaptive bitrate streaming.

In step 9 b, the distributor receives output video data from a remotelocation, such as a source hub, the output video data including the oneor more additional video objects inserted into insertion zones of thesource video data, wherein the insertion zones correspond to one or moreregions within selected frames of video material which are suitable forreceiving an additional video object. In one embodiment, the additionalvideo object is an advertising component. The output video data may alsoinclude one or more frames of video material in which an additionalvideo object has not been included, the reason for including one or moreframes of video material in which additional video objects are notincluded is to extend the boundary of the output video data tocorrespond with one or more boundary points of segments into which datais divided prior to distribution.

In step 9 c, the distributor receives metadata from the remote location,the metadata including information on at least one boundary point of thesource video data to be replaced by the created output video data. Themetadata comprises timing information, including the timing of theboundary point. In one embodiment the metadata comprises timinginformation on the length of time of the output video data. In oneembodiment, the metadata included information on the number of frames ofoutput video data.

In one embodiment, the metadata include data on a tracking pixel. Thetracking pixel can be configured to transmit a message as a result ofthe output video data being played. There tracking pixel could be placedat the start of the output video data, or at any point throughout thecourse of the output video data. The tracking pixel enables data to begathered on the number of times that a particular advert has beenplayed. If more than one tracking pixel is included in the metadata, thetracking pixel configured to fire at the start, middle and end of theoutput video data, then data can be collected on the number of times thestart, the middle and end of the output video data. In one embodimentthe metadata is in a Video Multiple Ad Playlist (VMAP) format.

The boundary point is based on the data defining the segment size intowhich the data is divided prior to distribution, and based on theselected frames.

In step 9 d, the source video data is split into source video datasegments. The size of the source video data segments corresponds to thedata on the provided size of the segments provided to the remotelocation. The one or more output video data segments are paced into thesource video data segments therein based on the received output videodata and the received metadata to create the final video data.

In one embodiment, segments of the final video data are encoded into afirst version of the final video data at a first bitrate; and one ormore additional versions of the segments of the final video data areencoded into one or more additional bitrates. In some embodiments, eachsegment is encoded into between 3 and 14 different bitrates of differentquality. In one embodiment, each segment is encoded into three bitrates:400 Kbps, 700 Kbps and 1500 Kbps.

The client can now choose segments encoded at different bitratesdepending on the available network capacity for the client. If theclient has a high bandwidth capacity then the segments encoded at a highbitrate are selected and the video data is streamed at a higher quality.If the network bandwidth is reduced, then the client can select a lowerbitrate segment.

In one embodiment, multiple versions of the output video data arecreated. A first advertising component corresponding to a first set ofadditional video data is inserted to produce a first version of theoutput video data. A second advertising component corresponding to asecond set of additional video data is inserted to produce a secondversion of the output video data. A third advertising componentcorresponding to a third set of additional video data is inserted toproduce a third version of the output video data. There may be more thanthree versions of the output video data created. In embodiments, the AdDecision System analyzes the profile of a user and selects a targetedadvert, based on the rules set by the Ad Policy Manager. Thecorresponding encoded output video data in which the selected advert isembedded is then added to the manifest file. The Stitched Manifest fileis then sent to the player which then seamlessly plays the output videodata comprising the selected advert as part of the final video data. TheAd Decision System may select a different advert for each user.

Targeted advertising satisfying specific geographies, specific devices,or specific viewers. In one embodiment, a different version of theoutput video data will exist for each targeted version in a singleadvertising campaign. These versions will differ in the advertisingcontent embedded within the output video data. For example output videodata version 1 could include a Salsa flavor for a product and outputvideo data version 2 could include a Quezo flavor for a product. Whichversion of the output video data is provided for a viewer is decided bythe Ad Decisioning System.

FIG. 10 illustrates a sequence diagram for inserting output video datainto source video data in accordance with some embodiments. Steps S1000,S1002 and S1004 correspond to steps S500, S502 and S504. In S1000,source video data is retrieved. In S1002, boundary points are identifiedwithin the source video data, based on the data of the segment sizeprovided by the distributor. The boundary points represent the segmentsinto which the video data will be divided prior to distribution, basedon data of the segment size provided by the distributor. The sourcevideo data is not necessarily divided at this stage.

In S1004, the source hub analyses the source video data to identify oneor more insertion zones in the source video data. Selected frames inwhich the one or more insertion zones are identified are represented byb′ and de′. Selected frames b′ correspond with segment B, therefore theboundary point occurs at the boundary between segments A and B. Selectedframesde′ correspond with segments D and E, therefore the boundary pointoccurs at the boundary between segments C and D.

In S1006, the additional video data is embedded within the selectedframes and output video data is created which includes the one or moreadditional video objects. However, in contrast to S506, the boundariesof the output video data do not necessarily correspond with a boundaryof the identified at least one boundary point. Rather, a boundary of anoutput video data segment will be extended if the selected frames have apredetermined proximity relationship with the at least one boundarypoint. The boundary of the output video data can be adjusted byincluding one or more additional frames into the output video datasegment which do not include one or more additional objects. One or moreadditional frames which do not include one or more additional objectswill be included in the output data segments if the selected frames havea predetermined proximity relationship with the at least one boundarypoint. In one embodiment, the predetermined proximal relationship isbased on a number of frames. In one embodiment, one or more additionalframes which do not include one or more additional objects will beincluded in the output data segments if the selected frames are within100 frames of the boundary point. In one embodiment, one or moreadditional frames which do not include one or more additional objectswill be included in the output data segments if the selected frames arewithin 10 frames of the boundary point.

In one embodiment, the predetermined proximal relationship is based on aperiod of time. In one embodiment, one or more additional frames whichdo not include one or more additional objects will be included in theoutput data segments if the selected frames are within 0.5 seconds ofthe boundary point. In one embodiment, one or more additional frameswhich do not include one or more additional objects will be included inthe output data segments if the selected frames are within 0.1 secondsof the boundary point.

In the example of S1006, output video data segment DE′ is not within thepredetermined proximity relationship with the at least one boundarypoint, and therefore one or more additional video frames are notincluded and the output video data segment DE′ is not extended to aboundary. Output video data segment B′ is within the predeterminedproximity relationship with the at least one boundary point, in thiscase the boundary between A and B. Therefore, one or more additionalframes one or more additional frames which do not include one or moreadditional objects will be included in the output data segment B′.

As with S506 information on where the segments of the output video datawhich includes B′, D′ and E′ is transmitted to the distributor in themetadata.

S1008 shows the final video data showing segments of the output videodata, B′, D′ and E′ replacing segments of the source vide data, B, D andE. Information on where the segments of the output video data whichincludes B′, D′ and E′ is transmitted to the distributor in themetadata.

In this embodiment, as the whole of the segment is not being replaced,one or more additional IDR frames are created and inserted in front ofeach newly created segments. An example of where the new IDR frames arerequired is represented by X and Y in the FIG. 10. The new DR frames arecreated by the distributor.

1.-20. (canceled)
 21. A method of incorporating video objects into a source video to produce an output video, the method comprising: receiving the source video that includes a plurality of video frames, the source video being divided into a plurality of video segments each having a uniform segment length, analyzing the source video to identify one or more selected video frames of the plurality of video frames that include one or more insertion zones that are suitable for receiving a video object; identifying one or more video segments of the plurality of video segments in which the one or more selected video frames are found; identifying a boundary point within the source video, the boundary point corresponding with a beginning video frame in the source video corresponding with the first video frame of the one or more video segments; and creating the output video comprising the video object embedded within the one or more selected video frames and corresponding with the boundary point.
 22. The method according to claim 21, wherein the output video has a length that is a multiple of the segment length.
 23. The method according to claim 21, further comprising generating metadata that includes information specifying the at least one boundary point of the source video.
 24. The method according to claim 21, wherein the boundary point of the source video comprises an Instantaneous Decoder Refresh (IDR) frame.
 25. The method according to claim 21, further comprising creating metadata that includes a tracking pixel that transmits a signal in response to an embed segment being played.
 26. The method according to claim 25, further comprising transmitting the output video and the metadata to a distributor.
 27. The method according to claim 21, further comprising: encoding a first version of the output video with a first bitrate; and encoding a second version of the output video with a second bitrate that is different than the first bitrate.
 28. The method according to claim 21, wherein the segment length is defined by a time period.
 29. The method according to claim 21, wherein the segment length is defined by a number of video frames.
 30. The method according to claim 21, wherein the output video is divided into segments for use in adaptive bitrate streaming.
 31. The method according to claim 21, further comprising creating metadata that includes information on the length of the output video.
 32. The method according to claim 21, further comprising creating metadata that includes a Video Multiple Ad Playlist (VMAP) file.
 33. The method according to claim 21, further comprising: identifying a second boundary point within the source video, the second boundary point corresponding with a video frame in the source video corresponding with the last video frame of the one or more video segments, and wherein the output video is bounded by the boundary point and the second boundary point.
 34. A non-transitory computer-readable medium having computer executable instructions stored thereon, which when executed by a computing device cause the computing device to perform the method according to claim
 21. 35. A method comprising: receiving a source video that includes a plurality of video frames, the source video being divided into a plurality of video segments, each video segments of the plurality of video segments having a uniform segment length; receiving a video object; analyzing the source video to identify one or more selected video frames of the plurality of video frames that include one or more insertion zones that are suitable for receiving a video object; identifying one or more selected video segments of the plurality of video segments in which the one or more selected video frames are found; identifying a first boundary point within the source video, the first boundary point identifying a point in time in the source video corresponding with the earliest video frame of the one or more selected video segments; identifying a second boundary point within the source video, the second boundary point identifying a point in time in the source video corresponding with the last video frame of the one or more selected video segments; and creating an output video comprising the video object embedded within the one or more selected video frames, the output video starting with the first boundary point and ending with the second boundary point.
 36. The method according to claim 35, wherein the output video has a length that is a multiple of the segment length.
 37. The method according to claim 35, wherein the first boundary point of the source video comprises an Instantaneous Decoder Refresh (IDR) frame.
 38. The method according to claim 35, wherein a frame adjacent with the second boundary point of the source video comprises an Instantaneous Decoder Refresh (IDR) frame.
 39. The method according to claim 35, further comprising: encoding a first version of the output video with a first bitrate; and encoding a second version of the output video with a second bitrate that is different than the first bitrate.
 40. A method comprising: receiving a source video that includes a plurality of video frames, the source video being divided into a plurality of video segments, each video segments of the plurality of video segments having a uniform segment length; receiving a video object; analyzing the source video to identify one or more selected video frames of the plurality of video frames that include one or more insertion zones that are suitable for receiving a video object; identifying one or more selected video segments of the plurality of video segments in which the one or more selected video frames are found; and creating an output video comprising the video object embedded within the one or more selected video frames of the selected video segments, the output video having a length that is a multiple of the segment length. 